[Day11] 多元線性迴歸(03)

14th鐵人賽 machine learning

ironcat45

2022-09-26 21:27:02

1478 瀏覽

分享至

建立模型的方法

上一篇只提到五種模型建立方法的前三種
這裡我繼續把後面兩種說完

All-In
Backward elimination
Forward selection
Bidirectional elimination
Score comparison

4. Bidirectional elimination 雙向淘汰

又稱 stepwise regression(逐步回歸分析）
這個方法結合Forward selection ＋ Backward elimination
簡單的原則是先做Forward selection, 選擇最具影響力的變數進入模型
接著判斷此變數再加入變成新變數後是否不再顯著, 若是就會被移除
這個方法會讓被選入的變數有機會被踢除
可以保證所有影響力大的變數被保留下來

接下來看建模型的步驟：
step1: Select a significance level to enter and to stay in the model(SLstay=0.05, SLenter=0.05)
設兩個判斷值用來判斷變數是否被踢除or被加入
SLstay: 一個舊的變量是否該被剔除
SLenter: 一個新的變量是否該被加入

step2: Perform the next step of forward selection
接著做forward selection
新變數若出現p-value < SLEnter 就可以被加入到模型

step3: perform all steps of backward elimination
接著做backward elimination
判斷舊變數若出現p-value < SLstay 就可以繼續留著, 反之則剔除

step4: now new variables can enter, and old variables can exit
現在新變數可以留下, 舊變數可以剔除

5. Score comparison 信息量比較

信息量: 對多元線性回歸模型的評價方式, 例如有的model 打90分, 有的model 打100分
接下來看建模型的步驟：
step1: select a criterion of goodness of fit(e.g. Akaike criterion(即一套打分系統))
選擇一套打分系統

step2: Contruct all possible regression Models: 2^N-1 total combinations
對所有可能的model 進行評分
假設有Ｎ個變量, 總共會有2^N-1 種不同的堆元線性回歸模型
因為每個變量可選可不選(2種選擇), 所以Ｎ個變量有2^N-1種組合

step3: Select the one with the best criterion
對模型逐一打分, 最後選一個分數最高的模型

FIN: Your model is ready

這個方法比較單純, 不過自變量維度變大時, 這個方法的計算量是很龐大的
example:
10 columns means 1023 models
當自變量個數越多, 模型數越多, 是指數型成長

下一篇會講述如何用python 來實現方法